Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: Nemo_bis Date: Jul 25, 2014 7:21am
Forum: opensource Subject: Latin texts mistaken for Russian

I often see Latin texts tagged as Russian by Google Books, for instance the just-imported

Perhaps the error is caused by the ample Ancient Greek passages? Is it worth finding such books and reporting them to have language corrected?

Granted, this book is clearly a nightmare to OCR in any language, given:
* skewed lines,
* varying font sizes, weird diacritics and stacked letters,
* pages with bites or scratches,
* ink passing through the paper from the other side,
* "dirty" papers (perhaps Florence flood of 1966 is to blame too).

This post was modified by Nemo_bis on 2014-07-25 14:21:33